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Abstract 

We show that there exists a Boolean function F which observes the following separations among 
deterministic query complexity (D(F)), randomized zero error query complexity ( Ro(F )) and ran¬ 
domized one-sided error query complexity (Ri(F)): -Ri(F) = 0(i/D(F)) and -Rq(F) = 0(D(F)) 3 ^ 4 . 
This refutes the conjecture made by Saks and Wigderson that for any Boolean function /, R-o(f) = 
0(D(/)) a753 ". This also shows widest separation between R\(f) and D(/) for any Boolean function. 
The function F was defined by Goos, Pitassi and Watson who studied it for showing a separation be¬ 
tween deterministic decision tree complexity and unambiguous non-deterministic decision tree com¬ 
plexity. Independently of us, Ambainis et al proved that different variants of the function F certify 
optimal (quadratic) separation between D(/) and Ro(f), and polynomial separation between Ro(f) 
and J?! (/). Viewed as separation results, our results are subsumed by those of Ambainis et al. How¬ 
ever, while the functions considerd in the work of Ambainis et al are different variants of F, we work 
with the original function F itself. 


1 Introduction 


The model of decision trees is one of the simplest models of computation. In this model, an algorithm 
for computing a Boolean function is given query access to the input. The algorithm queries different bits 
of the input, possibly in an adaptive fashion, and eventually outputs a bit. The objective is to minimize 
the number of queries made. The amount of computation is generally not the quantity of interest in 
this model. For a Boolean function /, The deterministic query complexity D (/) of / is defined to be 
the maximum (over inputs) number of queries the best deterministic query algorithm for f makes. The 
bounded-error randomized query complexity R(f) of / is defined to be the number of queries made 
on the worst input by the best randomized query algorithm for / that is correct with higljj probability 
on every input. Ro(f), the zero error randomized query complexity of /, is the expected number of 
queries made on the worst input by the best randomized algorithm for f that gives correct answer on 
each input with probability 1. Finally R\(f), the one-sided randomized query complexity of /, is the 
number of queries made on the worst input by the best algorithm that is correct on every input with 
high probability, and in addition correct on every 1-input with probability 1. We give formal definitions 
of these measures in the next section. 

*S. Mukhopadhyay is supported by a TCS Fellowship. 
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The relations between these query complexity measures have been extensively studied in the liter¬ 
ature. That randomization can save more than a constant factor of queries has been known for a 
long time. In their 1986 paper, Saks and Wigderson I SW86I gave examples of recursive NAND trees 
and recursive MAJORITY trees, for which they credited Snir and Ravi Bopanna respectively. In both 
these functions, the deterministic and randomized zero-error query complexity are polynomially sep¬ 
arated. In the same paper, Saks and Wigderson studied binary uniform NAND trees, and showed 
that Rq(F) — 0(D(F) 0 - 753 -) where F is the binary uniform NAND tree function. They also conjec¬ 
tured that this is the widest separation possible between these two measures of complexity for any 
Boolean function. For the same function, Santha ! San91 l showed that R(F) — (1 — 2e)Ro(F) where 
e is the error probability. So for this function we have R(F) = 0(D(F) 0 - 753 -). It is easy to see 
that D(f) > R(f),Ro(f),Ri(f). Blum and Impagliazzo I BI87 I. Tardos IITar89l and Hartmanis and 
Hemachandra 1 1 11 18 71 independently showed that Rq ( f ) > \jD(f ). Nisan IINis91l showed that for any 
Boolean function /, D(/) < 2 7R(/) 3 and D(/) < R\(f) 2 . The biggest gap known so far between D(/) 
and R(f) for any / is much less than cubic. 

During this work, we came to know of the recent work by Ambainis et al [ABB+15|. In this work the 
authors prove various separation results between different query complexity measures. Among several 
other results, the authors prove: 

1. There exists a function / for which Ro(f) — O( \/D(f )). In view of the lower bound, this is the 
widest separation possible between these two measures. This refutes the conjecture by Saks and 
Wigderson. 

2. There exists a function / for which Ro(/) = n(Ri(/) 3//2 ). 

1.1 Query Models 

Deterministic query complexity. A deterministic query algorithm can be thought of as a rooted bi¬ 
nary tree where each internal node is labeled with a variable and each leaf is labeled with 0 or 1. The 
algorithm starts by querying the variable at the root of the tree and depending on the value it gets it 
chooses between its left child and right child and thus goes down the tree recursively. If the value of 
a variable at any internal node is 0, the algorithm descends to its left child, otherwise it descends to its 
right child. Whenever the algorithm reaches a leaf, it outputs the value of the leaf and terminates. We 
say that the query algorithm correctly computes / if for any input x G {0, l} n , the algorithm outputs 
f(x). The deterministic query complexity of a function / is defined as follows. 


D(/) = min Depth(T), (1) 

where T ranges over decision trees which correctly computes /. 


Randomized query complexity. A randomized query algorithm can be thought of as a distribution 
over deterministic query algorithms. A randomized query algorithm can also be viewed as a query 
algorithm where each node has an additional power of tossing coins . After querying the variable 
associated with any internal node of the tree, the algorithm decides which input bit to query depending 
on the responses to the queries so far (i.e. the current node in the tree) and the value of the coin tosses 
while in that node. It is not hard to see that the two definitions are equivalent. We are interested in 
two different measures of complexity, one where we do not allow the algorithm to make error and we 
measure the expected number of queried variables on an input. Let us denote the expected number of 
queried variables by algorithm A for evaluating / on input x by Q(A, x). The zero error randomized 
query complexity of/, denoted by Rq(/), is defined as follows: 


R 0 (/) = minmaxQ(A,x), 
A x 


( 2 ) 
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where A ranges over all randomized query algorithms which correctly computes / on every input. It is 
to be noted that the expectation is taken over the random coin tosses. Another notion of complexity is 
randomized bounded error query complexity, where we allow the query algorithm to err on inputs and 
we look at the maximum number of queries on any input. We say that a randomized query algorithm 
A computes / with probability S if for every input x, PrR[A(x) 7 ^ f(x)] < 6. The bounded error 
randomized query complexity of /, denoted by Rg(f), is defined as follows. 


Rg(f) — minDepth(A r ), (3) 

A 

where A 7 denotes the support of the distribution of binary trees associated with A and we take the 
minimum over those A 's which computes / with error probability S. The depth of a collection of trees 
is interpreted as the maximum depth of any tree in that collection. Since the randomized bounded error 
query complexity of / for any two constant error values are within a constant multiplicative factor of 
each other, we drop the subscript S whenever convenient, and call it R(/). 

A third notion of query complexity is randomized one-sided query complexity. An input x is said to be 
a 0-input (1-input) of a function / if f(x) — 0 ( fix) = 1). We say that a randomized one-sided error 
query algorithm A computes / with probability S if for every 1-input, x, Pr/> A (x) f=- f(x)] — 0 and 
for every 0-input x, Prs[A(x) f= f(x)\ < S. The one-sided error randomized query complexity of /, 
denoted by Rf, is defined as follows. 

Rf — min Depth(Af), (4) 

A. 

where A j denotes the support of the distribution of binary trees associated with A and we take the min¬ 
imum over those A's which computes / with error probability S. Since the one-sided error randomized 
query complexity of / for any two constant error values are within a constant multiplicative factor of 
each other, we drop the subscript S whenever convenient, and call it R\ (/). 

For our zero-error algorithm we will use the following simple fact: For any Boolean function /, Rq (/) = 
0(max{R 1 (/), Ri (/)}). 


1.2 Our results 


In this work we prove the following results. 

Theorem 1. There exists a Boolean function F for which Ro(F) — 0(D(F ) 3 ^ 4 ). 

Theorem Q] refutes the conjecture made by Saks and Wigderson I SW861 . though this result does not 
match the lower bound of Ro(f) in terms of D(/). As mentioned in the Introduction, Ambainis et al 
llABB + 15l exhibit a function that certifies quadratic separation between Ro(f) and D(/), which is the 
widest possible in view of matching lower bound. 

Theorem 2. There exists a Boolean function F for which R 1 (F) = 0(\/D(F)). 

This separation matches the lower bound, upto logarithmic factors, on R\(f) in terms of D(f) for any 
function /. However, since Ro{f) > R] (/), the function used by Ambainis et al [ABB 15| also certifies 
the same separation. Thus, viewed as separation results, our results are subsumed by those of Ambainis 
et al HABB+151 . 

The functions F in Theorems Q] and EH are the same, and was first defined by Goos et al I IGPW15 I for 
showing a gap between deterministic decision tree complexity and unambiguous non-deterministic de¬ 
cision tree complexity. While the functions used by Ambainis et al are different variants of this function, 
we work with the original function itself. 

We define the function F now. The domain of F is F? = {0,1 }"' :l +r io 8 "li. An input M £ 'D to F is 
viewed as a matrix of dimension \fn x \fn. Each cell M h j of M consists of two parts: 
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1. A bit-entry bj j G {0,1}. 

2. A pointer-entry pi j G {0,1}flog"!. p- is either a valid pointer to some other cell of M, or is inter¬ 
preted as _L (null). If p h j is not a valid pointer to some other cell, we write "pi,j = -L". 

Now, we define what we call a valid pointer chain. Assume that t — \fn. For an input M to F, a sequence 
((z'l,/i),..., (z f ,/f)) of indices in [\/n] x [\/n] is called a valid pointer chain if: 

!• Kh = V ’ 

2 - bi 2 ,j 2 — — hi u j t — 0 ; 

3. Vfc < i lr p Kjl = _L; 

4. for i = 1,..., t - 1, = (z' £+1 ,/> +1 ) and p if# y f is _L; 

F evaluates to 1 on M iff the following is true: 

1. M contains a unique all l's column j\, i.e., there exists /i G [-^/zz] such that Vi G [y^i], b;,./i = 1. 

2. There exists a valid pointer chain ((zi,/i),..., ( }t,jt ))■ This means that the column j\ has a cell 

with non-null pointer entry. (h,ji) is the cell on column j\ with minimum row index whose 
pointer-entry is non-null. Starting from Pi lr j ir if we follow the pointer, the following conditions 
are satisfied: In each step except the last, the cell reached by following the pointer-entry of the cell 
in the previous step, contains a 0 as bit-entry and a non-null pointer as pointer-entry. In the last 
step, the cell contains a zero as bit-entry and a null pointer (_L) as pointer-entry. Also, this pointer 
chain covers all columns of M. 


By a simple adversarial strategy, Goos et al. I GPW15 I showed that D(F) = Q(zz). Our contribution is 
to show the following results. 

Lemma 3. For the function F defined above , Rq(F) — 0(zz 3 ^ 4 ). 

Lemma 4. For the function F defined above , R\{F) — 0(y/n). 

Clearly, Lemmas 131 and l~4l imply Theorems l~TI and l~2l respectively. 


2 Intuition of the Randomized One-sided Error Query Algorithm for 

F 

We show that the randomized one-sided error query complexity of F is Of \fn). In this section we 
provide intuition for our one-sided error algorithm for F. Our algorithm errs on one side: on 0-inputs 
it always outputs 0 and on 1 -inputs it outputs 1 with high probability. 

The algorithm attempts to find a 1-certificate. If it fails to find a 1-certificate, it outputs 0. We show that 
on every 1 input, with high probability, the algorithm succeeds in finding a 1-certificate. The 1-certificate 
our algorithm looks for consists of: 

1. A column j, all of whose bit-entries are l's. 

2. All null pointers of column j till its first non-null pointer-entry. 
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3. The pointer chain of length sfn that starts from the first non-null pointer entry, and in the next 
yfn — 1 hops, visits all the other columns. The bit entries of all the other cells of the pointer chain 
than the one in this column are 0. 

To find a 1-certificate, the algorithm tries to find columns with 0-cells on them, and adds those columns 
to a set of discarded columns that it maintains. To this end, a first natural attempt is to repeatedly 
sample a cell randomly from M, and if its bit-entry is 0, try to follow the pointer originating from that 
cell. Following the chain, each time we visit a cell with bit-entry 0, we can discard the column on which 
the cell lies. We can expect that, with high probability, after sampling 0(y/n) cells, we land up on some 
cell in the middle portion of the correct pointer chain that is contained in the 1-certificate (we call this 
the principle chain). Then if we follow that pointer we spend 0(y/n) queries, and eliminate a constant 
fraction of the existing columns. 

The problem with this approach is possible existence of other long pointer chains, than the principle 
chain. It may be the case that we land up on one such chain, of Cl(y/n) length, which passes entirely 
through the columns that we have already discarded. Thus we end up spending Cl(y/n) queries, but 
can discard only one column (the one we began from). 

To bypass this problem, we start by observing that the principle chain passes through every column, and 
hence in particular through every undiscarded column. Let N be the number of undiscarded column at 
some stage of the algorithm. Note that the length of the principle chain is y/n. Therefore if we start to 
follow it from a randomly chosen cell on it, we are expected to see an undiscarded column in roughly 
another \fn / N hops. In view of this, we modify our algorithm as follows: while following a pointer 
chain, we check if on an average we are seeing one undiscarded column in every y/n/ N hops. If this 
check fails, we abandon following the pointer, sample another random cell from M, and continue. Our 
procedure MlLESTONETRACE does this pointer-traversal. We can prove that conditioned on the event 
that we land up on the principle chain, the above traversal algorithm enables us to eliminate a constant 
fraction of the existing undiscarded columns with high probability. We also show that spending about 
yfn/N queries for each column we eliminate is enough for us to get the desired query complexity 
bound. 

After getting hold of the unique all l's column, the final step is to check if all its bit-entries are indeed 
l's, and if that can be completed into a full 1-certificate. That can clearly be done in O(yfn) queries. The 
VerifyCOLUMN procedure does this. 


3 Bounding R\(F) 


In this section we give the formal description and analysis of our one-sided error query algorithm for F: 
Algorithmic Algorithmic uses two procedures: VERIFYCOLUMN and MlLESTONETRACE. As outlined 
in the previous section, VerifyCOLUMN, given a column, checks if all its bit-entries are 1 and whether 
it can be completed into a 1-certificate. MlLESTONETRACE procedure implements the pointer traver¬ 
sal algorithm that we described in the preceding paragraph. We next describe the MlLESTONETRACE 
procedure in a little more detail. We recall from the last section that the algorithm discards columns in 
course of its execution. We denote the set of undiscarded columns by C. 

MlLESTONETRACE procedure 

The functions of the variables used are as follows: 

1. step: Contains the number of pointer-entries queried so far. A bit query is always accompanied 
by a pointer query, unless the bit is 1 in which case the traversal stops. So upto logarithmic factor, 
the value in step gives us the number of bits queried. 

2. seen: Set of columns that were undiscarded before the current run of MlLESTONETRACE, and that 
have so far been seen and marked for discarding. 

3. discard: size of seen 
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procedure MlLESTONETRACE(M, C, i, ]) 

Read b hj ; 

if bi'j = 1 then return ; 
end if 

step:=0; 
discard:= 1 ; 
current:=(i,j); 

seen:={/}; 

while step < lOO-v^w • disced t j Q 

read the pointer-entry of current; 
step •(— sfep+1; 

current <— pointer-entry of current; 
if current is _L then goto step ! 21 1 

end if 

read bit-entry of current; 

if current is on a column k in C \ seen and bit-entry of current is 0 then 
seen seen U{A:}; 
discard <r- discard+1; 

end if 
end while 

C i — C \ seen; 

end procedure 


procedure VerifyColumn(M,A:) 

Check if all the bit-entries of cells in the k-th column of M are 1; If not, output 0; 

if All the pointer-entries of cells in the the /c-th column of M are _L then 
Output 0; 

end if 

if The pointer chain starting from the first non-null pointer in column k is valid then 
Output 1; 
else 

Output 0; 

end if 

end procedure 
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Algorithm 1 

1 : C := set of columns in M. 

2 : for t — 1 to 0(^/n log n) do 
3: if |C| < 100 then 

4: goto ste p 101 

5: end if 

6 : Sample a column j from C uniformly at random; 

7: Sample i G \fn uniformly at random; 

8 : MilestoneTrace (M, C, i, /); 

9: end for 

10 : if \C\ > 100 or \C\ =0 then 
11 : Output 0; 

12 : else 

13: Read all columns in C ; 

14: if There is a column k with all bit-entries equal to 1 then 

15: VERIFYCOLUMN(M, 1 c); 

16: else 

17: Output 0; 

18: end if 

19: end if 


4. current : Contains the indices of the cell currently being considered. 


The condition in the while loop checks if the number of queries spent is not too much larger than at 
any point in time. The if condition in line 1 131 checks if the current pointer-entry is null. If it is null, C 
is updated, and control returns to AlgorithnfTI The condition in linc l 1.31 checks if the pointer chain has 
reached its end. 

To analyse AlgorithnfTI we need to prove two statements about MilestoneTrace, which we now 
informally state. Assume that the algorithm is run on a 1-input. 

1 . Conditioned on the event that a cell ( i,j ) randomly chosen from the columns in C is on the princi¬ 
ple chain, a call to MilestoneTrace (M, C, i,j) serves to eliminate a constant fraction of surviving 
columns with high probability. 

2. It is enough to ensure that the average number of queries spent for each eliminated column is not 
too much larger than -j^. Note that |C| is the number of undiscarded columns during the start of 
the MilestoneTrace procedure. 


In the following subsection, we prove that AlgorithnfTI makes 0(y/n) queries on every input. In the 
next subsection we prove that AlgorithrrfTlsucceeds with probability 1 on 0-inputs and with probability 
at least 2/3 on 1-inputs. LemmaTTl follows from Lemmaf 6 l Corollar\T9l and LemmafTTl 


3.1 Query complexity of Algorithm [Tj 

In this subsection we analyse the query complexity of AlgorithnfTI We bound the total number of bj f s 
and pif s read by the algorithm. Upto logarithmic factors, that is the total number of bits queried. For 
the rest of this subsection, one query will mean one query to a bit-entry or a pointer-entry of some cell. 
We first analyse the MilestoneTrace procedure. Recall that C denotes the set of undiscarded columns. 
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Lemma 5. Let i,j be such that b h j — 0. Let Q and D respectively be the number of queries made and number of 
columns discarded by a call to MlLESTONETRACE (M,C, i,j). Then, 


Q<D- 


200 yfn 

\C\ 


Proof. We note that the variable step contains the number of pointer queries made so far, and the variable 
discard maintains the number of columns marked so far for discarding. Every time the while loop is 

entered, step < 1 (JO yfn ■ discard _ j n g^p iteration of the while loop, step goes up by 1. So at any point, 

step < 100 yfn ■ discard _|_ j i emma follows by observing that the total number of bit-entries queried 
is at most one more than total number of pointer-entries queried. □ 


We now use Lemmal31to bound the total number of queries made by AlgorithnfTl 
Lemma 6 . A IgorithnfJ] makes O(yfn) queries on each input. 


Proof. Whenever b h j — 1, MlLESTONETRACE(M, C, i,j) returns after reading b h j. So the total number of 
queries made by all calls to MlLESTONETRACE (M,C, i,j) on such inputs is O(yfn). 

After leaving the while loop, the total number of queries required to read constantly many columns in C 
and to run VerifyCOLUMN is O(yfn). 

Since inside the while loop all the queries are made inside the MlLESTONETRACE procedure, it is enough 
to show that the total number of queries made by all calls to MlLESTONETRACE (M,C, i,j ) on inputs for 
which b ir j — 0 is O(yfn). 

Let t — 0(\fn) be the total number of calls to MlLESTONETRACE on such inputs, made in the entire 
run of AlgorithnTTl Let s, be the value of \C\ when the i-th call to MlLESTONETRACE is made, and let 
Sf_|_i be the value of \C\ after the execution of the f-th call to MlLESTONETRACE completes . Let As, and 
A qi respectively be the number of columns discarded and number of queries made in the f-th call to 
MlLESTONETRACE. Since C shrinks only when iy, = 0, we have As,- = s , +1 — s,- for i — 1... t. Since 
_ ^ i-i 

Si — yfn, we have that for i — 2,..., f, s, = y/n - ^ As j. 

;=i 

From lcmmT~5l we have A q; < As,- ■ 20 °y" -p 3 for i — 1,..., t. Substituting ^fn — Asy for s, when 

i > 1, and adding, we have. 


X’ A q { < 200-v/h ■ — + 3f 

i=l i=i s i 


A si 


— 200 -v/w ■ ( ■^ + f ^ , 


7=1 As i , 

< 200 y/n ■ ( I + - ? J—r + ■■■ + 


O(VTi) 

1 


y/Ti y/n - 1 y/n - Asi + 1 

1 1 

+ ...+ 


+ ... 


+ 


y/n — Asi yfn — Asi — 1 y/n — Asj — As 2 + 1 

11 1 

+ 


Vn ~ E l { As ; - yfn - E li Asy - 1 


yfn - Ej = i A Sj - As t + 1 


+ O(yfn) 


< 0{yfTl) ■ +°(v / «) 


0=1 


= O(yfn log n) + O(yfn) 
= O(yfn). 


Hence proved. 


□ 
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3.2 Success Probability of AlgorithmlTI 


In this section we prove that AlgorithnTTl outputs correct answer with probability 1 on 0-inputs and 
with probability at least 2/3 on 1-inputs. We start by a proving a probability statement ('LemmJDl that 
will help us in the analysis. 

Let X\, ... ,X( be non-negative real numbers and E;=i X; — N. We say that an index I G [/] is bad if there 
exists a non-negative integer 0 < D < N — I such that 

t+D AT 

Y2 X{ > 100(D + 1) • — 

i=I 1 

We say that an index I is good if I is not bad. 

Lemma 7. Let I be chosen uniformly at random from [i\. Then, 

99 

P[I is good] > — 

Proof. We show existence of a set K — {/i, • • • , Jt} of disjoint sub-intervals of 1 , l\ with integer end¬ 
points, having the following properties: 

1. Every bad index is in some interval }, G K. 

2. VI < i < t,EjehXj > 100|/;| • 

It then follows that the number of bad indices is upper bounded by 17/1 (by property 1 and dis¬ 

jointness of the intervals). But N > E;g[t] E/g/ ; x j > 100 • y Eigt |7/| / which gives us that E/gf \Ji\ < 

In the above chain of inequalities, the first inequality follows from the disjointness of /,'s and the second 
inequality follows from property 2 . 

Now we describe a greedy procedure to obtain such a set K of intervals. Let j be the smallest bad in¬ 
dex. Then there exists a d such that E;g[/,y+d] x i > 100(rf + 1 ) ■ y. We include the interval \j,j + d] in K. 
Then let f be the smallest bad index greater than j -)- d. Then there exists a d' for which E;g \j',j+d'] x i > 
100 (d' + 1) • -j. We include \f,f + d'\ in K. We continue in this way till there is no bad index which 
is not already contained in some interval in K. It is easy to verify that the intervals in the set K thus 
formed are disjoint, and the set K satisfies properties 1 and 2. □ 


Let us begin by showing that algorithirTTIis correct with probability 1 on 0 inputs of F. 

Claim 8. If Procedure VerifyCOLUMN outputs 1 on inputs M and k, then Mis a 1 input ofF. 

Proof. VERIFYCOLUMN outputs 1 only if the column k has all its bit-entries equal to 1, and if the pointer 
chain starting from the first non-null pointer entry is valid (recall the definition of a valid pointer chain 
from Sectior TL2ll . From the definition of F, for such inputs F evaluates to 1. □ 

Corollary 9. Let Mbea 0-input ofofF. Then algorithn[l\outputs 0 with probability 1. 

Proof. The corollary follows by observing that if algorithrrQ] returns 1, a call to VERIFYCOLUMN also 
returns 1, and hence from ClairrPHthe input is a 1 -input of F. □ 

Let us now turn to 1-inputs of F. Let M be a 1-input of F, that we fix for the rest of this subsection. 
Without explicit mention, for the rest of the subsection we assume that AlgorithirTTIis run on M. Since 
M is a 1-input, by the definition of F, there is a column C such that all its bit-entries are 1, and the 
pointer chain starting from the first non-null pointer-entry of j is valid. Call this pointer chain the 
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principle chain. Let (C = C\,... , c r -) be the order of columns of M in which the pointer chain crosses 
them. Let (C = m\,..., m C |) be the order of the columns of C in which the pointer chain crosses them. 
Note that the column C always belongs to C, as a column is discarded only if the bit-entry of some cell 
on it is 0. Define X, to be the number of Cj's between wz, and m, + |, including if i < \C\, and the 

|C| 

number of Cj’s after m,, including if i — \C\. Clearly ^ X, = \Jn. 

i=i 

Lemma 10. Let ( i,j) he a randomly chosen cell on the restriction of the principle chain to the columns in C (i.e. 
j £ C) and let \C\ — N > 100. Then with probability at least over the choice of ( i,j ), a run of the procedure 
TraceMilestone on inputs M, C, i, j shrinks the size ofC to at least . 

Proof. By applying LemmcTTlon the sequence =1 described in the paragraph preceding this lemma, 
except with probability at least 1/100 + 1/100 + 1/|C| < 3/100, j is a good index, j < (i.e. the 

column j has at least ^ columns of C ahead of it on the principle chain), and j f C. Since j f C, the 
bit-entry of the cell sampled is 0, and hence procedure TRACEMILESTONE does not return control in 
stepf3l In the procedure TraceMilestone, if current is on the principle chain, the condition in lin d 131 
cannot be satisfied unless current is the last cell on the chain. Now, if the condition in the while loop is 
violated while current is on the principle chain, it implies that j is a bad index. Thus with probability at 
least 1 — 3/100 = 97/100, the procedure does not terminate as long as all the ^ columns ahead of j 
are not seen. Since all columns in C that are seen are discarded, we have the lemma. □ 


Now, let us bound the number of iterations of the for loop of algorithrrQ] required to shrink \C\ by a 
factor of 1/100. 

Lemma 11. Assume that at a stage of execution of algorithnn\ where the control is in the beginning of the for 
loop , \C\ = N. Then except with probability 1 / 25, after 10 ^fn iterations of the for loop, \C\ will become at most 
99N/100. 

Proof. The probability that a cell on the principle chain is sampled in stepThl ancTTl is -W. So the prob¬ 
ability that in none of the 10 \Jn executions of stepd~6l ancTTl a cell on the principle chain is picked is 
(1 — - 7 ^) 10v/ " < -jjjjj. Conditioned on the event that a cell on the principle chain is sampled, from lemma 

1 101 except with probability 3/100, \C\ reduces by a factor of 1 /100 in the following run of MlLESTONE- 
Trace. Union bounding we have that except with probability 1/100 + 3/100 = 1/25, after 10 \fn 
iterations of the for loop, \C \ < 99N/100. □ 


Let t be the minimum integer such that \fn ■ ? < 100. Thus t — O(logn). For i — l,...,f, let the 

random variable Y, be equal to the index of the first iteration of the for loop of AlgorithrrQ] after which 

\C\ < \fn. (^) f - Let Z\ — Y\ and for i — 2,,t define Z ( - = Y,- — Y,-_ \. From Lemma TTTl for each i we 

t __ 

have E[Z,-] < 25 x 10 \Jn — 0{\fn). By linearity of expectation, we have E[Z,•] = 0(\/zzlogn). By 

i=l 


z _ 

Markov's inequality, with probability at least 2 /3, ^ Z, = O (\fn log n ). Thus, if we choose the constant 

i—1 

hidden in the number of iterations of the for loop of AlgorithrrQ] large enough, then with probability at 
least 2/3, \C\ shrinks to less than 100. Then the VerifyCOLUMN procedure reads all the columns in C 
and outputs the correct value of F. Thus we have proved the following Lemma. 


Lemma 12. With probability at least 2/3, algorithm[T\outputs 1 on a 1-input. 
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4 Zero error query complexity of F 


We first present a randomized query algorithm which satisfies the following: If the algorithm outputs 0 
then the given input is a 0-input (The algorithm actually exhibits a 0 -certificate) and if the given input is 
a 0-input, then the algorithm outputs 0 with high probability. This algorithm makes 0(n 3 ' 4 ) queries in 
worst case. For the randomized zero-error algorithm we run Algorithm [T] and this algorithm one after 
another. If Algorithm |T] outputs 1 then we stop and output 1. Else, if Algorithm [2] says 0, we stop and 
output 0. Otherwise, we repeat. By the standard argument of ZPP — RP n coRP we get the randomized 
zero-error algorithm. Though the query complexity of Algorithm [T] is Of \fn), we get the zero-error 
query complexity of F to be (3(n 3/4 ) because of the query complexity of Algorithmic] 

Now we define the notion of column covering and column span which we will use next. 

Definition 13. For two columns C ; - and Cj in input matrix M, we say Cj is covered by Q if there is a cell ( k , i) 
in Ci and a sequence (ft, S \),..., (ft, St) of pairs from [y/n] x [y/n\ such that: 

1. b k/i = 0, 

2. 8 t = j, 

3. for all £ G [f], bp (/ $ ( — 0 and 

4- Pk,i = (ft,*) and for £= 1,.. .,f - 1, P( M ) = (ft+\, 8 i+ 1 ). 

Definition 14. For a column C, we define Span c to be the subset of columns in M which consists of C and any 
column which is covered by C. 

We first give an informal description of the algorithm and then we proceed to formally analyze the 
algorithm in Section [4.11 As mentioned before this is also a one-sided algorithm, i.e., it errs on one side 
but it errs on the different side than that of Algorithm [l] The O-certificates it attempts to capture are as 
follows: 

1. If each the columns has a cell with bit-entry 0, then the function evaluates to 0. Those bit-entries 
form a 0-certificate. If there are many 0's in each column. The algorithms may capture such a 
certificate in the first phase ( sparsification ). 

2. Two columns C\ and C 2 in M such that Ci <£_ Span ( - 2 and C 2 i Span C| . Existence of two such 
columns makes the existence of a valid pointer chain impossible. This is captured in the second 
phase of the algorithm. 

3. Lastly, if there is column all of whose bit-entries are 1, which does not have a valid pointer chain, 
then that is also a 0-certificate. The algorithm may capture such a certificate in the last phase. 


The algorithm proceeds as follows: The main goal of the algorithm is to eliminate any column where 
it finds a 0 in any of its cells. First the algorithm filters out the columns with large number of 0's with 
high probability by random sampling. The algorithm probes i f 34 locations at random in each column 
and if it finds any 0 in any column, it eliminates that column. This step is called sparsification. After 
sparsification, we are guaranteed that all the columns have small number of 0's. Now the remaining 
columns can have either of the following two characteristics: First, a large number of the columns in 
existing column set have large span. This implies that if we choose a column randomly from the existing 
columns, the column will span a large number of columns (i.e., a constant fraction of existing columns) 
with high probability and we can eliminate all of them. The algorithm does this exactly in the procedure 
A of the second phase. The other case can be where most of the columns have small spans. We can show 
that if this is the case, then if we pick two random columns Q and Cj from the set of existing columns, 
Ci will not lie in the span of Cj and vice-versa with high probability, certifying that F is 0. This case is 
taken care of in the procedure B of the second phase of the algorithm. 
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The algorithm runs procedure A and procedure B one after another for logarithmic number of steps. If 
at any point of the iteration, the algorithm finds two columns which are not in span of each other, the 
algorithm outputs 0 and terminates. Otherwise, as the procedure A eliminates the number of existing 
columns by a constant factor in each iteration, with logarithmic number of iteration, either we com¬ 
pletely exhaust the column set, which is again a O-certificate, or we are left with a single column. Then 
the algorithm checks the remaining column and the validity of the pointer chain if that column is an all 
l's column and answers accordingly. This captures the third kind of O-certificate as mentioned before. 
In Algorithrrf2l we set r to be the least number such that \fn ■ () T < 1. Clearly r = O (log n). 


4.1 Analysis 

Let's first look at the running time of the algorithm 

Claim 15. The query complexity of Algorithm\2\is 0(/; 3 ^ 4 ) in worst case. 


Proof. We count the number of bit-entries and pointer-entries of the input matrix the algorithm probes. 
Upto logarithmic factor, that is asymptotically same as the number of bits queried. 

The first for loop runs for ^fn iteration and in each iteration samples T cells from a column. So the 
number of probes of the first for loop is 0(\/n x T) — 0(» 3 ^ 4 ). 

In procedure A, the number of probes needed to scan the column and to trace pointer from the column 
is 0(fi 3 ' /4 ). In procedure B, the algorithm has to check the span of two columns, which takes Ofn 3/4 ) 
probes. Thenumber of iterations of the for loop ofline[9]is atmostr = Of log n). Hence the totalnumber 
of probes made inside the for loop is Ofn 3 ^ 4 ). 

Lastly, VerifyCOLUMN takes Of \/n) probes. So the total number of probes is bounded by Ofn 3 ^ 4 ). 
Thus the claim follows. □ 

The first for loop, i.e., line [3] to [8] is called sparsification. We have the following guarantee after sparsifica- 
tion. 

Claim 16. After the sparsification , with probability at least 99/100, every column in C has at most n 1//4 cells 
with bit-entry 0. 

Proof. We will bound the probability that all the T probes in a column outputs 1 conditioned on the fact 
that the column has more than n 1 ^ 0's. A single probe in such a column outputs 0 with probability at 
least 1/« 1//4 . Hence all the probes output 1 with probability (1 — 1/ n 1/4 ) T < 1 By union bound, 

this happens to some column in M with probability at most 1 /100. □ 


This implies that except with probability 1/100, the if conditions of line 4201 and l32hre never satisfied. 
Claim 17. Either of the following is true in each iteration of the for loop ofline\9\ 


1. for a random column C E C, |Span c | > |C|/100 with probability at least 1/100. 

2. For two randomly picked columns Q and Cj in C, with probability at least 24/25, Cj £ Spanq and 
Q £ Spanq. 

Proof. Suppose (1) does not hold. For two random columns C, and Cj, Let L, (L f) be the event that 
|Span c .| (|Span c ,|)| > |C|/100. Let Ey (Ejy) be the event that Cj E Spanq (Q E Spanq). Thus we have, 

P{£;,;} - P {Li} ■ P{Ei,j\Li} +P{q} ■ P{E ; - ; |q} 
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Algorithm 2 

1 : C Set of columns in M; 

2: r Least number such that y/n ■ (^}) T < 1; 

3: for each column C in C do 

4: Sample T — 10 ■ //' / 4 log >i cells uniformly at random; 

5: if any bit-entry of any cell is 0 then 

6: C <— C\ {C}; 

7: end if 

8 : end for 

9: for f — 1 to t do 

10: if \C\ <1 then 

ll: goto step ! 401 

12 : end if 

13: repeat 

14: procedure A 

15: Sample a column C from C uniformly at random; 

16: Read all entries of all cells of C; 

17: if All bit-entries are 1 then 

18: VerifyColumn(M, C); 

19: end if 

20: if Number of 0 bit-entries in C > n ] ' /4 then 

21: Output 1 and abort; 

22 : end if 

23: For each cell on C with bit-entry 0, trace pointer and compute Span c ; 

24: C i — C \ Span c ; 

25: end procedure 

26: until a log log n times 

27: procedure B 

28: Pick two columns C\ and C 2 uniformly at random from C ; 

29: if All bit-entries of Q (C 2 ) are 1 then 

30: VERIFYCOLUMN(M, Cl) (VERIFYCOLUMN(M / Cl)); 

31: end if 

32: if Number of 0 bit-entries in Ci or C 2 > n 1 ; ' 4 then 

33: Output 1 and abort; 

34: end if 

35: if C 2 i Span Cl and Q i Span C2 then 

36: Output 0 and abort; 

37: end if 

38: end procedure 

39: end for 
40: if C — 0 then 

41: Output 0; 

42: end if 

43: if \C\ — 1 then 

44: LetC = {C}. 

45: VerifyColumn(M,C); 

46: end if 
47: Output 1. 
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<P{L ; }+P{E lV/ |L}} 

1 1 1 
- 100 + 100 “ 50 

Similarly P{£y ,} < By union bound, (2) is true; □ 


Now we are ready prove the correctness of the algorithm. 

Claim 18. Given a 0-input, Algorithm\2\outputs 0 with probability at least 19/20 . 

Proof. We first note that after the execution of for loop in line[3j except with probability at most 1 /100 
there is no column in C having more than n 1//4 cells with bit-entries 0. 

If the algorithm finds a column all of whose bit-entries are 1, it gives correct output by a run of Verify- 
COLUMN. 

Next, we note that if in any iteration of the for loop (line [9j, condition (2) of Claim[T7]is satisfied, then 
we find a 0-certificate (i.e. a pair of columns, none of which lies in the span of the other) with probability 
at least 24/25. 

Finally, assume that for each iteration of the for loop, condition (2) is not satisfied. This implies that for 
each iteration of the loop, condition (1) is satisfied (From Claim [17}. As we run procedure A a log log n 
times, with probability at least 1 — (^)“ lo § lo S " > 1 — (for appropriate setting of the constant a) we 
land up on a column whose span is at least \C | /100 and hence we eliminate 1 /100 fraction of columns 
in C, in one of the iterations of the inner repeat loop (line [13). By union bound, the probability that 
there is even one bad repeat loop where we do not eliminate |C|/100 columns, is at most 1/100. Thus 
the probability that after the execution of for loop is over, |C| > 1, is at most 1/100. So, the total error 
probability is bounded by 1 /100 + max{ 1 /25,1/100} = 1/20 from which the claim follows. □ 

Claim 19. Given a 1-input, Algorithm\2\outputs 1 with probability 1. 

Proof. The proof of this claim is straight-forward. As mentioned before. Algorithm [2] outputs 0 only if 
it finds a 0-certificate. As there is no 0-certificate for a 1-input, the algorithm outputs 1. □ 

Lemma [3] follows by combining Claim[l8]and ClaimH9l 

Acknowledgements: We thank Arkadev Chattopadhyay, Prahladh Harsha and Srikanth Srinivasan for 
useful discussions. 
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